Multi-Modal Scene Interpretation

نویسندگان

Michael Wünstel

Thomas Röfer

چکیده

The visionary goal of developing an easy to use service robot implies several key tasks such as speech understanding, object recognition and scene understanding. Besides the more sensor-oriented capabilities such systems need extensive meta knowledge, e.g., about mental representations of spatial relations to match the view between man and machine. Only if all parts fit together an unrestricted man machine communication can be established. Therefore a cognitive system has to address many different parts that have to be integrated, in the technical sense and especially in the cognition models [1]. Especially when connecting a perceptive component with a spatial reasoning component using a speech recognition and synthesis component, the probabilistic area of object recognition has to be coupled with the logical area of formal reasoning. The cognitive vision system ORCC presented here combines diverse recognition strategies that afford an extensive description of an unreserved scene: In a first step the room demarcations and structurally simple objects such as tables are extracted using as well functional as structural properties. Then further objects are segmented based on their position, followed by a structurally more complex and a more shapeoriented recognition step. Then, this spatial information is enriched with colour-based information about the objects. Afterwards, the resulting scene description can be used as an input for a speech-based man-machine dialogue about the objects within in the scene [2].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-modal Data Fusion Techniques and Applications

In recent years, camera networks have been widely employed in several application domains such as surveillance, ambient intelligence or video conferencing. The integration of heterogeneous sensors can provide complementary and redundant information that fused to visual cues allows the system to obtain an enriched and more robust scene interpretation. A discussion about possible architectures an...

متن کامل

Grouping Over Stereo for Visual Cues Disambiguation

In stereo–vision, the goal is to reconstruct the three–dimensional structure of the scene observed from two camera inputs. The core problems are the matching of features into both camera frames, and the interpretation of image features in terms of the 3D scene. In this paper, we use a rating scheme of the potential correspondences, based on the multi–modal intrinsic similarity of the features. ...

متن کامل

Cross-calibration of time-of-flight and colour cameras

Time-of-flight cameras provide depth information, which is complementary to the photometric appearance of the scene in ordinary images. It is desirable to merge the depth and colour information, in order to obtain a coherent scene representation. However, the individual cameras will have different viewpoints, resolutions and fields of view, which means that they must be mutually calibrated. Thi...

متن کامل

Multi-modal Auto-Encoders as Joint Estimators for Robotics Scene Understanding

We explore the capabilities of Auto-Encoders to fuse the information available from cameras and depth sensors, and to reconstruct missing data, for scene understanding tasks. In particular we consider three input modalities: RGB images; depth images; and semantic label information. We seek to generate complete scene segmentations and depth maps, given images and partial and/or noisy depth and s...

متن کامل

Analyzing the Affect of a Group of People Using Multi-modal Framework

Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing wo...

متن کامل

Unifying Registration and Segmentation for Multi-sensor Images

We propose a method for unifying registration and segmentation of multi-modal images assuming that the hidden scene model is a Gibbs probability distribution.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 22 شماره

صفحات -

تاریخ انتشار 2008

Multi-Modal Scene Interpretation

نویسندگان

چکیده

منابع مشابه

Multi-modal Data Fusion Techniques and Applications

Grouping Over Stereo for Visual Cues Disambiguation

Cross-calibration of time-of-flight and colour cameras

Multi-modal Auto-Encoders as Joint Estimators for Robotics Scene Understanding

Analyzing the Affect of a Group of People Using Multi-modal Framework

Unifying Registration and Segmentation for Multi-sensor Images

عنوان ژورنال:

اشتراک گذاری